(Approximate) Frequent Item Set Mining Made Simple with a Split and Merge Algorithm
نویسندگان
چکیده
In this paper we introduce SaM, a split and merge algorithm for frequent item set mining. Its core advantages are its extremely simple data structure and processing scheme, which not only make it very easy to implement, but also fairly easy to execute on external storage, thus rendering it a highly useful method if the data to mine cannot be loaded into main memory. Furthermore, we present extensions of this algorithm, which allow for approximate or “fuzzy” frequent item set mining in the sense that missing items can be inserted into transactions with a user-specified penalty. Finally, we present experiments comparing our new method with classical frequent item set mining algorithms (like Apriori, Eclat and FP-growth) and with the approximate frequent item set mining version of RElim (an algorithm we proposed in an earlier paper and improved in the meantime).
منابع مشابه
SaM: A Split and Merge Algorithm for Fuzzy Frequent Item Set Mining
This paper presents SaM, a split and merge algorithm for frequent item set mining. Its distinguishing qualities are an exceptionally simple algorithm and data structure, which not only render it easy to implement, but also convenient to execute on external storage. Furthermore, it can easily be extended to allow for “fuzzy” frequent item set mining in the sense that missing items can be inserte...
متن کاملImproved Frequent Pattern Mining Algorithm using Divide and Conquer Technique with Current Problem Solutions
Frequent patterns are patterns such as item sets, subsequences or substructures that appear in a data set frequently. A Divide and Conquer method is used for finding frequent item set mining. Its core advantages are extremely simple data structure and processing scheme. Divide the original dataset in the projected database and find out the frequent pattern from the dataset. Split and Merge uses...
متن کاملSimple Algorithms for Frequent Item Set Mining
In this paper I introduce SaM, a split and merge algorithm for frequent item set mining. Its core advantages are its extremely simple data structure and processing scheme, which not only make it quite easy to implement, but also very convenient to execute on external storage, thus rendering it a highly useful method if the transaction database to mine cannot be loaded into main memory. Furtherm...
متن کاملUsing a Data Mining Tool and FP-Growth Algorithm Application for Extraction of the Rules in two Different Dataset (TECHNICAL NOTE)
In this paper, we want to improve association rules in order to be used in recommenders. Recommender systems present a method to create the personalized offers. One of the most important types of recommender systems is the collaborative filtering that deals with data mining in user information and offering them the appropriate item. Among the data mining methods, finding frequent item sets and ...
متن کاملWeb Usage Mining in Online Social Network
The web content in present scenario is mainly comprised of Social media systems such as blogs, photo and link sharing sites and on-line forums. . Web Usage Mining is the application of data mining techniques in the field of social networks to discover exciting usage patterns from SNS data and to serve the needs of SNS applications in a better manner. The major use of web usage mining techniques...
متن کامل